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Abstract 

This paper presents a new backup system named TE-CDP. 
The new system contains three methods. First of all, we 
propose the snapshot flow method that based on the twin- 
engin CDP, by which we make incremental backup of 
snapshot based on point-in-time. It could save storage space. 
In addition, in order to ensure the real-time of data 
protection, we propose the single mirroring method. Last 
but not the least, because of the poor consistency of 
snapshot, we present the consistency agent method of CDP, 
it could ensure the consistency of CDP. The experiment 
shows TE-CDP is able to protect data and recover snapshot 
in seconds. 
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Introduction 

With the advances in networked information services, 
data protection and recovery have become a hot topic 
(Jing Yang et al., 2012; Zhichao Li et al., 2012 ). 
Snapshot method ensures data consistency for backup 
system and CDP is more better then snapshot on 
backup time lapse granularity. 

Nowadays, most backup systems use both snapshot 
method and CDP. The backup system of Huazhong 
University of Science and Technology used ST-CDP 
mechanism to optimize the traditional Timely 
Recovery to Any Point-in-time (TRAP) architecture 
(Jing Yang et al., 2012). Nankai University embedded 
CDP into LVM, they called it SnapCDP (Feng Wang et 
al., 2009). 

But we have the backup problems as follows: 

■ storage space wasted; 

■ the poor real-time of data protection; 

■ the poor consistency of snapshot data. 

To solve the above problems, we propose TE-CDP 
backup system and use three methods: 

■ Snapshot Flow Method: It's based on the twin- 


engin CDP (COW and ROW) and It can save 
storage space by making incremental backup of 
snapshot based on point-in-time. 

■ Single Mirroring Method: It avoids I/O 

competition between TE-CDP and common 
applications. It also ensures the real-time of 
data protection when synchronous backup 
data. 

■ Consistency Agent Method of CDP: It could 
ensure the data consistency of CDP. 

The paper is organized as follows: We discuss related 
work in the next section. Section 3 presents the 
architecture of TE-CDP and introduce the tree 
methods in our system. Section 4 presents the two 
major function modules in TE-CDP. Section 5 gives the 
evalution of TE-CDP. Section 6 concludes our paper 
and outlooks our furture work. 

Related Work 

Many scholars promote backup systems by different 
methods over years. 

Some of them promoted their backup systems by 
improving the efficiency or saving the storage space of 
CDP system. In (Maohua Lu et al., 2011), a high- 
performance index update mechanism is presented to 
reduce the memory resource occupation of the block- 
level CDP system. In (Xiao Li et al., 2011), convex 
point SNAPshot (CSNAP) is presented, it takes less 
than 10% storage space of traditional snapshot 
method. 

Others used deduplication to save storage space. In 
(Stephen Smaldone et al., 2013), deduplication is used 
to reduce the space that file system and metadata used 
when protect the virtual machine. In (Wei Zhang et al., 
2013), authors used a low-cost deduplication which 
reduce the usage of CPU and memory of each VM in 
the cloud storage backup system. 

All these systems didn't consider both storage-saving 
space of backup system and data consistency of the 
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system. We present a backup system called TE-CDP. It 
uses the snapshot flow method to save storage space. 
The method to make incremental backup of snapshot 
is based on point-in-time. It ensures the real-time of 
data protection by using single mirroring method. 
And it also ensures the consistency of CDP by using 
consistency agent method of CDP. 

System Design 

In this section, we discuss the architecture of TE-CDP. 
We also discuss the backup methods in our system. 

The Architecture of TE-CDP 

The most difficult thing to design a backup system is 
that how to ensure correctly and efficiently the data 
protection and recovery in a limited bandwidth. We 
use LAN-Free backup idea to design the architecture 
as shown in Fig. 1. 



FIG. 1 THE ARCHITECTURE OF TE-CDP 
Obviously, TE-CDP has four layers. Application layer 
(The source side of backup) contains some computers 
that should be protected. Scheduling protective layer 
(The backup server) contains function modules of the 
system. Link layer is usually switchboards or hubs, it's 
the communication bridge between application layer 
and scheduling protective layer. Physical storage layer 
is disks array which map disks to scheduling 
protective layer. 

Snapshot Flow Method 

Snapshot flow method is used in the backup server to 
solve the storage space wasted problem. The method 
makes incremental backup of snapshot based on 
point-in-time. 

This method is based on the twin-engin CDP. We can 
set the CDP engine when we add backup and TE-CDP 


will mark CDP based on CDP engine. 

Single Mirroring Method 

Single mirroring method is used in the source side of 
backup to ensure real-time of data protection. It also 
avoids I/O competition between TE-CDP and common 
applications. 

We design a mechanism for the method which can 
avoids I/O competition when synchronous backup 
data. Every I/O request should wait the I/O before it 
and common applications have the initiative and 
priority of I/O. 

Consistency Agent Method of CDP 

Consistency agent method of CDP is also used in the 
source side of backup. It ensures consistency of CDP. 

We design a mechanism which can refresh the cache 
on disks in the method. When receiving the command 
from CDP schedule, the listener inform the daemon to 
refresh the cache on disks and then imform the backup 
server to mark CDP. 

The Major Function Module 

This section will introduce two major function 
modules of TE-CDP: The data protection module and 
The snapshot recovery module. 

The Data Protection Module 

In this module, we use the CDP storage volume to 
store the snapshot. We also use the three methods 
which introduced in the previous section to achieve 
the goal of real-time backup and consistency backup. 

The module's working process is shown in Fig. 2. 

And we also can conclude the process as follows: 

1. Apply for the space of CDP storage volume at 
the backup server. 

2. Set the CDP engine of CDP storage volume, 
which must be the one of COW and ROW. 

3. Bind the CDP storage volume with a logical 
volume at the backup server. 

4. Map the logical volume to the source side of 
backup. 

5. Use single mirroring method to synchronous 
backup data. 

6. If the synchronous is completed, the backup 
server will mark CDP when receive the 
command from consistency agent. 
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FIG. 2 WORKING PROCESS OF THE DATA PROTECTION 
MODULE 

The Snapshot Recovery Module 

This module is based on the CDP engine of CDP 
storage volume we set. We use the engine's 
mechanism to recover the appointed snapshot. 

The module's working process is shown in Fig. 3. 



FIG. 3 WORKING PROCESS OF THE SNAPSHOT RECOVERY 
MODULE 

We also can conclude it as follows: 

1. Appoint a snapshot to recover the channel 
where you want to recover.. 

2. Use the CDP engine mechanism of CDP 
storage volume to recover the snapshot to a 
virtual volume. 

3. Map the virtual volume to the appointed 


channel. 

4. Use the recovered virtual volume to maintain 
your business. 

Evaluation 

Our experiment hardware environment is shown in 
Table 1. 


TABLE 1 THE HARDWARE ENVIRONMENT OF TE-CDP 


System 

Layer 

Equipment 

Type 

Hardware Configuration 

Application 

Layer 

Infocore 

Tigler 

SC8200 

Specification: 2U 
Processor: 2.13GHz (Dual-Core) 
Memory: 32GB 
Capacity: 600GB 

Link Layer 

Voltaire 

Infiniband 

Specification: 1U 
Link Module: 24 DDR ports 

Scheduling 

Protective 

Layer 

IBM System 
x3850 X5 

Specification: 4U 
Processor: 2.40GHz (10 cores) 
Memory: 32GB 
Capacity: 600GB 

Physical 

Storage 

Layer 

Sugon DS- 
2120FA6TB 

Specification: 2U 
Capacity: 2TB 


We install ESXi 5.3 on the application layer and use a 
virtual machine whose operating system is RedHat 6.2 
as the source side of backup. We use infonix 6. 5. 1-4 
operating system as the backup server. 

We compare the used time of adding backup between 
a 50GB disk and a 100GB disk, each disk test 10 times, 
the result is shown in Table 2. 


TABLE 2 THE USED TIME OF ADD BACKUP 


Times 

Time of Add Backup (s) 

50GB 

100GB 

1 

4.058 

3.760 

2 

3.706 

3.502 

3 

3.840 

3.814 

4 

3.957 

4.008 

5 

3.720 

3.881 

6 

3.747 

3.677 

7 

3.921 

3.804 

8 

3.658 

3.883 

9 

3.638 

3.588 

10 

3.863 

3.575 

average 

3.811 

3.749 


As it shown in Table 2, we can say that add backup for 
the 50GB disk used about 3.811s and it used about 
3.749s for the 100GB disk. 

We also compare the used time of snapshot recovery 
between the two disks, the ID of snapshot is 0, and the 
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result is shown in Table 3. 

As we can see in Table 3, the recover time of the 50GB 
disk is about 1.661s and the time is about 1.598s of the 
100GB disk 


TABLE 3 THE USED TIME OF SNAPSHOT RECOVERY 


Times 

Time of Snapshot Recovery (s) 

50GB 

100GB 

1 

1.896 

1.582 

2 

1.572 

1.584 

3 

1.544 

1.623 

4 

1.589 

1.659 

5 

1.836 

1.706 

6 

1.678 

1.536 

7 

1.711 

1.574 

8 

1.706 

1.571 

9 

1.536 

1.581 

10 

1.541 

1.562 

average 

1.661 

1.598 


In summary, our system is able to add backup and 
recover snapshot in seconds. And disk size can not 
affect the efficiency of TE-CDP while adding backup 
or snapshot recovery. 

Conclusion and Furture Work 

This paper presents a backup system called TE-CDP. 
The system not only uses snapshot flow method to 
solve storage-space-wasting problem, but also the 
single mirroring method to ensure real-time of data 
protection when synchronous the data. Besides, it also 
uses consistency agent method of CDP to ensure the 
consistency of snapshot data. 

The experiment result explain that the system can add 
backup and restore snapshot in seconds and the 
efficiency of the system would not be influenced when 
disk size changed. 

The system also has a disadvantage, that is, when the 
system receives multiple commands, the performance 
of backup server will be reduced. In our furture work, 
an adaptive algorithm will be presented in which we 
will consider both server performance and backup 
efficiency. 
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